On the Dynamics of the h— index in Complex Networks with Coexisting Communities 
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This article investigates the evolution of the ft— index in a complex network including two com- 
munities (in the sense of having different features) with the same number of authors whose yearly 
productions follow the Zipf 's law. Models considering indiscriminate citations, as well as citations 
preferential to the fitness values of each community and/or the number of existing citations are pro- 
posed and numerically simulated. The h— indices of each type of author is estimated along a period 
of 20 years, while the number of authors remains constant. Interesting results are obtained including 
the fact that, for the model where citations are preferential to both community fitness and number 
of existing citations per article, the h— indices of the community with the largest fitness value are 
only moderately increased while the indices of the other community are severely and irreversibly 
limited to low values. Three possible strategies are discussed in order to change this situation. In 
addition, based on such findings, a new version of the h— index is proposed involving the automated 
identification of virtual citations which can provide complementary and unbiased quantification of 
the relevance of scientific works. 

PACS numbers: 89.75.Hc,01.75.+m,01.00.00,01.30.-y,07.05.Mh 



'The only factor becoming scarce in a world of abun- 
dance is human attention. ' (Kevin Kelly, Wired) 



I. INTRODUCTION 

It all started in darkness and mystery. In the begin- 
nings of humankind, explanation of the world and predic- 
tion of the future lied deep into the impenetrable realm 
of sorcerers and medicine men. Except for the extremely 
rare initiated, the inner ambiguous workings of divina- 
tion and sorcery were jealously guarded. Similar secrecy 
was observed through much of the subsequent history of 
humankind, including the age of oracles in the classical 
world and alchemy all through the middle ages. 'Knowl- 
edge' was not for everybody, it was the power source of a 
few. Ultimately, the value of those practices did not stem 
from their effectiveness, but emanated from all types of 
symbology, dogmas, metaphors and ambiguities. 

With time, some light was shed, and part of hu- 
mankind finally realized the value of confronting expla- 
nations and predictions with reality, through experimen- 
tations. That such a basic fact would take such a long 
time to be inferred provides a lasting indication of the 
inherent limitations of human nature. Be that as it may, 
the value of experiments finally established itself, from 
the renaissance up to the present day. Such an essential 
change was accompanied by another important fact: it 
became progressively clearer that once widely dissemi- 
nated, new findings acted in order to catalyze still more 
discoveries. The popularization of printing techniques 
contributed substantially to implementing this new phi- 
losophy, being steadily crystalized into an ever growing 
number of books, and then journals and WWW files. One 



of the immediate consequences of the first scientific pa- 
pers was the respectively unavoidable citations. Today, 
citations and impact factors (calculated by taking into 
account citations, as well as other indicators) are widely 
used, to the happiness of some and chagrin of others, for 
quantifying the quality and productivity of researchers, 
groups, institutions and journals. 

Scientific indices are now regularly applied in or- 
der to decide on promotions, grants and identification of 
scientific trends. In this way, science became, to a large 
extent, driven by scientometry. However, it is important 
not to forget the initial purpose of scientific publishing of 
fostering dissemination of high quality knowledge and re- 
sults for the benefit of humankind. One important point 
to be born in mind refers to the fact that all existing 
scientific indices are biased in some specific way. For 
instance, the total number of articles published by a re- 
searcher is not necessarily related to its productivity un- 
less their age (or seniority) is taken into account. At the 
same time, the number of citations received by a work 
or author is also relative, because this number can de- 
pend on joint-authorship, the specific area, or even be 
a consequence of some error in the original work. Yet, 
though not perfect, scientific indices do provide some 
quantification of the productivity and quality of papers, 
researchers, institutions and even countries and conti- 
nents. The common sense approach, given the unavoid- 
able limitations of the indices, is not to dismiss them, 
but to try to identify their faults so that they can be fur- 
ther improved. And, little wonder, the best bet is to use 
science to improve the scientific indices. 

It is a positive sign of our age that relatively great 
effort, reflected in a growing number of related publica- 
tions (e.g. [lL 1^ ISl^ la, Q> [M [lO: [l^l ) , that 

science has indeed been systematically used for studying 
and enhancing scientific indices. One of the most inter- 
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esting recent developments was Hirsh's proposal of the 
h— index [jj. Having sorted the articles of a researcher 
in decreasing order of citations, the h value can be im- 
mediately obtained as the position h along the sequence 
of articles just before the number of citations become 
smaller than h. Several are the advantages of such an 
index 0, U ^3 with respect to other more traditional 
indicators, including the fact that the h— index does not 
take into account the citation tail (i.e. works with few 
citations) and is more robust than the total number of 
citations per author in the form of sporadic joint publica- 
tions with famous researchers |l9j . However, the h— index 
has also been acknowledged by Hirsh to be potentially 
biased by factors such as the number of authors and 
the specific scientific area @- Several additional spe- 
cific shortcomings of the h— index have been identified 
and efforts have been made at their respective correction 
(e.g. 0, 0, IH 0)- Yet, the h— index is indeed an in- 
teresting alternative which deserves continuing attention 
aimed at possible further refinements. Growing atten- 
tion has also been focused on the dynamical aspects of 
the evolution of the h =index (e.g. 6]) as well as the joint 
consideration of the evolution of author and articles net- 
works (e.g. Q). Another interesting trend is the compar- 
ison of the ft,— index with more standard scientometrics 
indicators including peer judgements (e.g. 0,0). 

The total number of publications of an author can be 
roughly estimated from the h— index as Ct = ah 2 0, 
where a is a constant empirically found to lie between 
3 and 5. In other words, though not precise, this re- 
lationship explains a good deal about the source of the 
greater stability of this measurement when compared to 
the traditional total number of citations Ct- At the same 
time, the above relationship is not perfect, otherwise the 
h— index would be but a transformed version of the total 
number of citations. Another interesting measurement 
proposed by Hirsch [j| is the m— index, defined by the 
linear model h oc ran, where n is the number of subse- 
quent ages (usually years). Therefore, m corresponds to 
the approximate (mean or instant) rate of increase of the 
h— index with time. An m— index of 3 obtained for a re- 
searcher, for instance, suggests that his h— index tends 
to increase 30 times after 10 years. 

Several related investigations, including Hirsh's orig- 
inal work [jj, assume that the articles tend to receive 
a fixed number of citations c along time. While it 
would be possible to consider a time window for cita- 
tions, it is also interesting to take into account preferen- 
tial citation rules such as in complex networks research 
(e.g. E E E3) According to this model, nodes 
which have many connections (e.g. citations) tend to at- 
tract more connections, giving rise to the 'rich get richer' 
paradigm. Another important aspect which has been rel- 
atively overlooked is the presence of communities in the 
scientific world (e.g. [lall7| ). Several are the possible ori- 
gin for such communities, including the area of research, 
language of publication, age, style, among many others. 

The present work reports an investigation on the simu- 



lated dynamics of the h— index considering variable num- 
ber c of citations received per article, defined by preferen- 
tial attachment. As such, this work represents one of the 
first approaches integrating h— index and complex net- 
works. However, we believe its main contributions to lie 
elsewhere, mainly in the consideration of the two com- 
munities [2(|, henceforth abbreviated as A and B, with 
distinct fitness values and under the realistic dynamics of 
preferential attachment, as well as the assumption that 
the number of papers published by each author follows 
the Zipf's law (e.g. [l|). These two communities produce 
articles with respective fixed fitness indices j a and fg . In 
order to reflect some inherent difference between the two 
communities - e.g. as a consequence of the researcher 
age, writing style, language or specific area (more likely 
combinations of these) - we impose that Ja = 2/s, i.e. 
the articles in community A are inherently twice as much 
more citable than those produced by the other commu- 
nity. Note that any of the above criterion can be used to 
separate the citation networks into 2 or more subgraphs, 
e.g. by establishing respective thresholds [2l|. It is also 
important to emphasize at the outset that the presence 
of these two (or more) communities is assumed rather 
than taken for granted. The same can be said about 
the possible origin of the fitness difference. It is hoped 
that the present work can provide subsidies for the even- 
tual identification of such distinct communities from the 
perspective of the observation of the h— indices of the 
respective authors. 

The considered simulated dynamics extends over 20 
years. Because of computational restrictions, the num- 
ber of authors is limited to 78 which, under the Zipf's law, 
implies a total of 302 papers per year. Each article is as- 
sumed to yield the fixed number of w citations to other 
works, self-citations included. For simplicity's sake, the 
number or papers published per year by each author, as 
well as the number of authors, are also considered fixed, 
which is not a great drawback given the relatively short 
period of the simulation (i.e. 20 years). Despite its un- 
avoidable simplifications, the suggested model provides 
a number of remarkable results and trends, including 
bleak perspectives for the community with smaller fitness 
(B), which are identified and discussed. A brief discus- 
sion is also provided concerning possible strategies to be 
adopted by community B in order to improve its overall 
h— indices. Based on the simulation results, the proposal 
for yet another enhanced version of the h— index, based 
on the identification of virtual citations in terms of the 
number of shared main features of each work (e.g. reve- 
labed by statistics or artificial intelligence), is outlined. 

The article starts by defining the model and follows by 
presenting the results obtained considering two values of 
w and uniform/preferential attachment. Considerations 
are made regarding possible means to change the situa- 
tion of community B as well as for the proposal of a new 
version of the h— index. The work concludes by summa- 
rizing the main findings and suggesting perspectives for 
future works. 
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II. THE MODELS 

The number of articles published by year by each au- 
thor i, henceforth y(i), is assumed to follow the Zipf's 
distribution [l8||. i.e. 

p(y) = cyP, (1) 

where p(y) is the distribution probability of y and (3 
and c are real parameters. We define the specific form 
of this relation (i.e. its parameters) by establishing the 
two extremity points (y,p(y)) and (1, m) and (s, 1) of the 
respective distribution. In other words, we assume that 
m authors publish only one paper per year and only one 
author publishes s papers per year. Therefore, we have 
that 

c = m, (2) 
(3 = -log(m)/log(s) (3) 

It is henceforth assumed that m = 15 and s = 
30. In addition, we have to sample from this dis- 
tribution. Without great loss of generality, we chose 
y = (1,2, 3, 5, 10, 15, 30) and consequently obtain p(y) = 
(15, 9, 6, 4, 2, 2, 1). In other words, 15 authors publish one 
article per year, 9 authors publish 2 articles per year, and 
so on. This leads to NA = 39 authors and a total of 151 
yearly articles. Such a configuration is assumed for the 
two considered communities A and B, implying a grand 
total of 78 authors and 302 papers per year. 

In order to represent the citations network, we adopt a 
directed network (i.e. a digraph) defined as T — (V,Q), 
where V is the set of N vertices (or nodes) representing 
the articles and Q is the set of E edges (or links) connect- 
ing the nodes (i.e. the citations). Note that both V and 
Q vary along the 20 considered years. A citation from an 
article j to another article i is represented as (j, i) and 
stored into the adjacency matrix K as K(i, j) = 1 (a null 
entry is imposed otherwise) . The number of citations re- 
ceived by each article i is immediately given in terms of 
the respective indegree of the respective node, i.e. 

N 

k(i) = J2K(i,r) (4) 

r=l 

Although presenting identical structure as far as the 
number of authors and respective number of articles pub- 
lished per year are concerned, the fitness of the articles 
produced by community A can be considere to be twice as 
large as those published by community B, i.e. Ja — 2/b- 
We henceforth assume that /a = 1. These values are used 
in order to bias the establishment of the links during the 
simulations as explained below. 

The growth of the citation network is performed in 
yearly terms. Four dynamics are considered for compar- 
ison purposes: (i) UNI - uniform; (ii) PREFF - prefer- 
ential to community fitness; (hi) PREFC - preferential 



to existing article citations; and (iv) DBPREF - prefer- 
ential to community fitness and existing citations, each 
of which is described in the following. Though all models 
considered in this article do not include a citation time 
window, this is not a great shortcoming given the rela- 
tively short period of the simulation (i.e. 20 years). 

In the UNI model, each of the 301 articles added each 
year are assumed to cite exactly w articles randomly cho- 
sen among those published from the previous to the cur- 
rent year. We consider two situations, defined by w = 5 
and 20. The PREFF model is similar to the UNI scheme, 
but now the new citations take into account the commu- 
nities fitness. As a consequence, articles from community 
A become twice as much likely to be cited than those from 
community B. The PREFC model is also preferential, 
but here each of the w citations per article is performed 
preferentially to the number of existing citations of each 
article published from the beginning to the current year. 
This model is therefore similar to the Barabasi- Albert 
model (e.g. 0,0,0,0]), except that the indegrees (i.e. 
number of citations) are not updated during the year, but 
only at its end. Finally, the DBPREF model is doubly 
preferential, to both existing citations and communities. 
More specifically, a list is kept where the identification 
of each article is entered a total number of times corre- 
sponding to the value of its incoming citations multiplied 
by the community fitness (i.e. f 'a — 2 for community A 
and /b = 1 for community B). New citations are then 
chosen by random uniform selection among the elements 
in the above list. Each of the configurations was run 50 
times in order to provide statistical representativeness, 
while the h— index and total number of citations per au- 
thor Nt were calculated for each author at each year. 



III. SIMULATION RESULTS AND DISCUSSION 

Figure 0(a) shows the evolution of the h— indices for 
the seven considered types of authors (i.e. those publish- 
ing y = (1, 2, 3, 5, 10, 15, 30) articles per year have similar 
dynamics and are averaged together) in community A or 
B under the UNI dynamics while assuming w = 5. The 
analogue results obtained for the PREFF dynamics for 
communities A and B are given in Figures 0c) and (e), 
respectively. Figures 0b, d,f) give the respective results 
obtained for w = 20. It is clear from Figure that the 
h— indices of all types of authors tend to increase mono- 
tonically with time, though at different rates. Actually, 
as revealed after some elementary reasoning, all citations 
will tend to increase linearly with the years. This is a di- 
rect consequence of the adopted undiscriminate citation 
scheme: in principle, any author will receive a fixed aver- 
age number of citations per year (equal to w). Therefore, 
the h— indices will be roughly proportional to the square 
root of the years. In addition, the h— indices of each type 
of author will directly reflect its yearly production. 

Because of the linear rate of increase of the citations 
per type of author, this model has little interest, except 
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for providing a comparison standard for the other models 
considering citations preferential to the number of cita- 
tions. In particular, note that in the case of identical 
community fitness values (shown in (a) for w — 5 and 
(d) for w — 20), the evolution of the ft— indices would not 
be too different from those obtained for different fitness 
values (shown in (b-c) for w = 5 and (e-f) for w = 20). 
For instance, the most productive author in community 
A would reach an ft— index of 13 after 20 years in case 
the two communities were identical and an ft— index of 18 
after that same period in case its community had twice 
as much fitness as community B. In other words, the 
different fitness values have relatively little effect on the 
relative evolution of the ft,— indices. 

Figure |2J) (a) shows the evolution of the ft— indices for 
the seven considered types of authors in communities A 
or B under the PREFC dynamics while assuming w — 5. 
The analogue results obtained for the DBPREF dynam- 
ics for the A and B are given in Figures Etc) and (e), 
respectively. Figures ^b,d,f) give the respective results 
obtained for w — 20. Recall that all these simulations 
consider citations preferential to the current total among 
citations of each article ('rich get richer'). All curves 
are characterized by a non-linear portion along the first 
years, followed by nearly linear evolution. Also, as in the 
indiscriminate case, the h— indices of the 7 types of au- 
thors tend to reflect their yearly production. As could 
be expected, the standard deviations for all cases tend to 
increase with the author type productivity. 

Let us first discuss the situation arising for w = 
5. Note that a pronouncedly sharper increase of the 
ft— indices is verified along the first 4 or 5 years for the 
most productive author types for this value of to. When 
no distinction is made between the fitness values of the 
two communities (i.e. model PREFF) - see Figure Ufa), 
the h— indices of the 7 types of authors tend to evolve 
steadily until reaching, at year 20, the configuration 
shown in line 1 of TableQ] Now, in the case of different fit- 
ness values for the two communities (model DBPREF), 
the evolution of the ft— indices is much steeper for com- 
munity A (Fig. [2^) than for community B (Fig. Et>) ■ The 
ft— indices harvested after 20 years by the 7 types of au- 
thors in communities A and B in this case would be like 
those given in lines 2 and 3 of Table [IJ respectively. The 
ratio between the ft— indices of communities A and B with 
different fitness values and the ft— index values in the case 
of equal fitness are given in lines 4 and 5, respectively, in 
Tabled 

Strikingly, while the different fitness of community 
A contributes to moderate increase ratios varying from 
1.174 to 1.402, the effect is catastrophic for community 
B, with respective ratios varying from 0.56 to 0.37. The 
reason for such a dynamics is that, with the progress 
of the years, the articles in community A become ever 
more cited and competitive, deviating most of the cita- 
tions that would be otherwise established within com- 
munity B. This is a situation where, though the rich do 
not get so much richer, the poor becomes irreversibly 



line 


ATI 


AT2 


AT3 


AT4 


AT5 


AT6 


AT 7 


1 


4.6 


7.0 


8.7 


11.5 


16.4 


19.2 


26.1 


2 


5.4 


8.2 


10.5 


14.2 


21.5 


25.9 


36.6 


3 


2.6 


3.5 


4.3 


5.3 


6.6 


7.4 


9.7 


4 


1.174 


1.171 


1.207 


1.235 


1.311 


1.349 


1.402 


5 


0.56 


0.50 


0.49 


0.46 


0.40 


0.39 


0.37 


6 


5.8 


9.1 


11.7 


16.6 


25.6 


32.6 


50.7 


7 


6.1 


9.9 


13.3 


18.1 


29.5 


38.2 


58.9 


8 


2.9 


4.2 


5.2 


6.9 


11.0 


14.7 


21.8 


9 


1.052 


1.088 


1.137 


1.090 


1.152 


1.172 


1.162 


10 


0.50 


0.46 


0.44 


0.42 


0.43 


0.45 


0.43 



TABLE I: The ft-indices of the 7 types of authors after 20 
years and respective ratios. See text for explanation. Each of 
the author types i is identified as ATi. 

poorer as the preferential effect will continue until virtu- 
ally no citation take place yearly inside community B. An 
even more acute situation would have been observed in 
the likely case that the fitness of community A increased 
with its overall growing h— indices. As is visible in Fig- 
ure Et e )> this same effect will slightly contribute to level 
the ft— index values among the individuals in community 
B. 

The situation for w = 20 is largely similar to that 
discussed above for w — 5, with the following differences. 
First, a short plateau of ft— index values appear along the 
first years, especially for the most productive authors in 
the cases of equal fitness (Figure Et>) and for commu- 
nity A with different fitness (Figure EtO • The relative 
increase of the ft— indices observed with respect to the 
equal fitness case (i.e. the ratios between the lines 7 and 
8 with line 6, respectively) are given in lines 9 and 10. 
Now, while minimal increase ranging from 1.052 to 1.162 
is obtained for community A in the case of different fit- 
ness values, the ratio for community B varies from 0.50 
to 0.43. In addition, the exhaustion of the citations in- 
side community B is now clearly visible in the saturation 
of the ft,— indices in Figure Etf)- 



IV. STRATEGIES FOR IMPROVING 
INDIVIDUAL ft-INDICES 

Given the largely unfair dynamics identified for the au- 
thors in community B, it becomes interesting to consider 
by which means this situation could be, at least partially, 
improved. Of course, in case the fitness difference were 
a direct consequence of the quality of the publications 
in community B, the immediate answer would be that 
the authors in that community should try to improve 
their standards or be doomed indeed. However, in case 
the differences of fitness have a more arbitrary and biased 
origin, it becomes justifiable to consider means to correct 




FIG. 1: The h— indices for the seven considered types of authors obtained for any of the two communities with the UNI model 
(a) and the indices obtained for communities A (c) and B (e) while considering the PREFF (b) model for w = 5. The analogue 
results obtained for w = 20 are given in (b) and (d,f) 
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the situation. The following three possibilities, which are 
by no means exhaustive, could be considered: 

A bit more attention from the richer: Authors 
in community A tries to cite those in B more frequently. 
The main advantage of this solution is that the authors 
in community A would just loose a little bit, while those 
in B would gain a lot with respect to the even fitness 
situation. After all, citations should be based only on 
the inherent quality and contribution of each work. 

Collaborative strategy: Authors in B participate as 
co-authors with community A. Although such a practice 
would tend to enhance the h— index values in community 
B, such an increase would be limited by the high resilience 
of the h-index with respect to such initiatives. 

A bit more attention among the poorer: In this 
case, the authors in community B would pay greater at- 
tention to the work of their colleagues, trying to reduce 
the different fitness effect on the preferential citations. 
Again, this should reflect the inherent quality and con- 
tributions of each work. 



V. TOWARDS MORE COMPREHENSIVE 
CITATION INDICES 

Although creative proposals such as the h— index and 
enhanced variations do provide interesting advantages for 
measuring the significance of scientific publishing, they 
can still be biased by several factors including the pres- 
ence of communities with varying citation fitness which, 
as shown in the previous section, can lead to critical sit- 
uations. It would be interesting, in the light of the ob- 
tained results, to consider some possible modifications 
and enhancements to the h— index, as addressed in the 
following. 

First, we have to go back to the reasons why cita- 
tions exist after all, which include mainly: (a) establish 
the context of the research; (b) provide additional infor- 
mation about the adopted concepts and methods; and 
(c) compare methodologies or results. However, all such 
cases can be conveniently unified into the following cri- 
terion: 

• Citations should included in order to com- 
plement the work in question. As such, all 
citations should be directly related to the 
main aspects developed in each new article. 

Now, it happens that the relationship between any two 
articles can be automatically inferred, to some degree of 
accuracy, by using artificial intelligence methods com- 
bined with the ever increasing online access to hight qual- 
ity scientific databases and repositories. One of the sim- 
plest approaches involves counting how many keywords 
arc shared by any pair of articles. In order to define the 
direction of the citations (actually its causality) , the new 
article would be naturally linked to older entries in the 
databases. The number of implied citations would natu- 
rally vary with the comparison methodology and adopted 



thresholds, but would nevertheless provide a less arbi- 
trary and complete means for getting more comprehen- 
sive and less biased citations from which the respective 
h— index could be calculated. Actually, after some fur- 
ther reflection it becomes clear that such a citation sys- 
tem allows a series of additional advantages, including: 

1. Inherently linked to bibliographical re- 
search: One of the preliminary steps in every ar- 
ticle is to perform a reasonably complete research 
on existing related works, the so-called bibliograph- 
ical search. It would be interesting to use the same 
system(s) for both bibliographical search and au- 
tomatic citations, ensuring consistency. 

2. More substantive evaluation: Provided good 
journals (e.g. with reasonable impact factor) are 
considered for the databases, the quality of the 
cited works would be at least partially assured. In- 
deed, a given article could be more likely to be read 
and evaluated by referees of a good journal than 
by an eventually hassled author seeking for con- 
textual references. After all, citations are known 
sometimes to include copies from references in re- 
lated previous articles (e.g. @). 

3. Avoidance of personal biases: Because the vir- 
tual citations would be established from databases 
while considering objective keywords, no space is 
left for any eventual personal biases. 

4. Quantification of the quality of the work: 

With the advance of more sophisticated intelligent 
computer systems, it will become possible to have 
the automatic citation system to try to quantify 
several important qualities of an article, including 
originality, clarity, grammar, and even fraud detec- 
tion. 

It can not be said that automatic citation can be eas- 
ily accomplished ro that it will be fully precise from the 
beginning, but certainly it can provide a second, comple- 
mentary, indication to be taken into account jointly with 
more traditional scientometric indices. At the same time, 
the continuing advances of multivariate statistics and ar- 
tificial intelligence will contribute to achieving ever more 
intelligent and versatile automatic citation and indexing 
systems. 

VI. CONCLUDING REMARKS AND FUTURE 
WORKS 

In order any artificial process can be improved, it is im- 
perative to quantify its performance in the most objective 
and unbiased way as possible. Scientific citations - prop- 
erly normalized by area, number of authors and always 
under the auspices of common sense - are no exception 
to this rule. Since the first printed scientific and techni- 
cal works, authors and readers have been involved in an 




FIG. 2: The h— indices for the seven considered types of authors obtained for any of the two communities with the PREFC 
model (a) and the indices obtained for communities A (c) and B (e) while considering the DBPREF (b) model for w = 5. The 
analogue results obtained for w = 20 are shown in (b) and (d,f). 
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ever evolving complex system of citations aimed at con- 
textualizing and complementing each piece of reported 
research. Though indicators such as the total number of 
published articles per author, the total number of cita- 
tions, or the citations per article, amongst many others, 
have been systematically used for promotions, grants and 
identification of scientific trends, there is still no perfect 
index. Recently introduced by Hirsch |9j, the h— index 
presents a series of interesting advantages over more tra- 
ditional indicators, as well as some specific shortcomings 
which have been progressively addressed. 

At the same time as scientometrics progresses healthly 
and inexorably, it is important to stick to the original 
aims of scientific publication, namely the dissemination 
of new findings in order to foster even further develop- 
ment. In order to complement and enhance reported 
works, it is essential to provide significant and unbiased 
citations which can properly contextualize and comple- 
ment each piece of work. Primarily, each citation is 
an acknowledgement of a previous work, contributing to 
its significance and recognition of the respective author. 
However, because scientometrics increasingly determines 
the course of science, it is critically important to always 
revise and improve the respective indices. 

The present work has addressed the dynamical evolu- 
tion of the h— index considering a limited period of time 
(20 years) in a citation network involving two commu- 
nities whose number of authors follow a particular con- 
figuration of Zipf's law. Other distinguishing features 
of the reported models include the consideration of cita- 
tions preferential to an inherent value of fitness assigned 
to each community as well as to the existing number of 
citations. Although the number of papers published by 
year by each author remains constant, two different num- 
ber of citations emanating from each article (i.e. w = 5 
and 20) were considered separately. 

Four types of models were considered in simulations 
involving 50 realizations of each configuration. Linear 
increase of citations was observed for the two models in- 
volving indiscriminate citations and citations preferen- 
tial to the community fitness only. The two more real- 
istic situations assuming the citations to be preferential 
to the current number of citations of each paper, espe- 
cially the model where the citations were also preferential 
to the community fitness values, yielded particularly in- 
terestin results. When compared to the evolution of the 
h— indices of the two communities evolving with citations 
preferential only to the number of citations, the model 



involving citations also preferential to the communities 
fitness values showed that the authors in community A 
experienced moderate increase in the ft,— indices while the 
indices of the authors in community B suffered severe re- 
duction. It should be recalled that the presence of coex- 
isting communities is but a hypothesis, to be eventually 
confirmed through additional experimental work. 

Having identified such trends in multiple-community 
systems of citations, we briefly discussed three strate- 
gies which could be adopted in order to compensate for 
the different fitness values. In addition, an improved ap- 
proach has been outlined which can provide complemen- 
tary characterization of the significance and productivity 
of the production of authors or groups. More specifically, 
it has been suggested that statistical andartificial intelli- 
gence methods be used in order to identify virtual cita- 
tions from each new work to other previous works stores 
in databases while taking into account the overlap of key 
features (e.g. key words, main contributions, etc.) be- 
tween the new and previous works. A number of further 
advantages have been identified for this approach. 

Future extensions of the present work include the con- 
sideration of larger number of authors, coexistence of 
more than two communities, as well as the investigation 
of possible border effects implied by the relatively small 
size of the adopted networks. It would also be interesting 
to perform simulations taking into account longer peri- 
ods of time, citation time windows (e.g. no citations to 
articles older than a given threshold), and the progressive 
addition and retirement of authors. 

Scientometrics corresponds to a peculiarly interesting 
circular applicaton of science to improve itself through 
the proposal of ever more accurate and unbiased indices 
and measurements. While the advances of computing 
have implied an inexorably increasing number of articles 
and new results, it is suggested that they also hold the 
key - in the form of artificial intelligence - to proper 
quantification of scientific productivity and quality. After 
all, as hinted in the quotation at the beginning of this 
work, if human attention is becoming so scarce, perhaps 
automated digital attention can at least provide some 
complementation. 
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